• Analysis and evaluation of MapReduce solutions on an HPC cluster 

      Veiga, Jorge; Expósito, Roberto R.; Taboada, Guillermo L.; Touriño, Juan (Pergamon Press, 2016-02)
      [Abstract] The ever growing needs of Big Data applications are demanding challenging capabilities which cannot be handled easily by traditional systems, and thus more and more organizations are adopting High Performance ...
    • BDEv 3.0: energy efficiency and microarchitectural characterization of Big Data processing frameworks 

      Veiga, Jorge; Enes, Jonatan; Expósito, Roberto R.; Touriño, Juan (Elsevier BV * North-Holland, 2018-09)
      [Abstract] As the size of Big Data workloads keeps increasing, the evaluation of distributed frameworks becomes a crucial task in order to identify potential performance bottlenecks that may delay the processing of large ...
    • Enabling Hardware Affinity in JVM-Based Applications: A Case Study for Big Data 

      Expósito, Roberto R.; Veiga, Jorge; Touriño, Juan (Springer, 2020)
      [Abstract]: Java has been the backbone of Big Data processing for more than a decade due to its interesting features such as object orientation, cross-platform portability and good programming productivity. In fact, most ...
    • Evaluation and optimization of Big Data Processing on High Performance Computing Systems 

      Veiga, Jorge (2018)
      [Resumo] Hoxe en día, moitas organizacións empregan tecnoloxías Big Data para extraer información de grandes volumes de datos. A medida que o tamaño destes volumes crece, satisfacer as demandas de rendemento das aplicacións ...
    • Flame-MR: An event-driven architecture for MapReduce applications 

      Veiga, Jorge; Expósito, Roberto R.; Taboada, Guillermo L.; Touriño, Juan (Elsevier BV * North-Holland, 2016)
      [Abstract] Nowadays, many organizations analyze their data with the MapReduce paradigm, most of them using the popular Apache Hadoop framework. As the data size managed by MapReduce applications is steadily increasing, the ...
    • MarDRe: efficient MapReduce-based removal of duplicate DNA reads in the cloud 

      Expósito, Roberto R.; Veiga, Jorge; González-Domínguez, Jorge; Touriño, Juan (Oxford University Press, 2017)
      [Abstract] This article presents MarDRe, a de novo cloud-ready duplicate and near-duplicate removal tool that can process single- and paired-end reads from FASTQ/FASTA datasets. MarDRe takes advantage of the widely adopted ...
    • MREv: An Automatic MapReduce Evaluation Tool for Big Data Workloads 

      Veiga, Jorge; Expósito, Roberto R.; Taboada, Guillermo L.; Touriño, Juan (Elsevier, 2015)
      [Abstract]: The popularity of Big Data computing models like MapReduce has caused the emergence of many frameworks oriented to High Performance Computing (HPC) systems. The suitability of each one to a particular use case ...
    • Optimization of Real-World MapReduce Applications With Flame-MR: Practical Use Cases 

      Veiga, Jorge; Expósito, Roberto R.; Raffin, Bruno; Touriño, Juan (Institute of Electrical and Electronics Engineers, 2018-11-12)
      [Abstract] Apache Hadoop is a widely used MapReduce framework for storing and processing large amounts of data. However, it presents some performance issues that hinder its utilization in many practical use cases. Although ...
    • Performance Evaluation of Big Data Frameworks for Large-Scale Data Analytics 

      Veiga, Jorge; Expósito, Roberto R.; Pardo, Xoán C.; Taboada, Guillermo L.; Touriño, Juan (IEEE Computer Society, 2017-02-06)
      [Abstract] The increasing adoption of Big Data analytics has led to a high demand for efficient technologies in order to manage and process large datasets. Popular MapReduce frameworks such as Hadoop are being replaced by ...
    • The HPS3 Service: Reduction of Cost and Transfer Time for Storing Data on Clouds 

      Veiga, Jorge; Taboada, Guillermo L.; Pardo, Xoán C.; Touriño, Juan (IEEE Computer Society, 2015-03-12)
      [Abstract] In the past several years, organizations have been changing their storage methods as the volume of data they managed has increased. The cloud computing paradigm offers new ways of storing data based on scalability ...